Hierarchical Normal Model & Gibbs Sampling Project
Bayesian Analysis of U.S. Support for Ukraine Aid
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE,
fig.width = 7,
fig.height = 4
)
set.seed(2025)
This project uses Bayesian methods to study attitudes toward U.S. military aid to Ukraine. The data come from a The Economist / YouGov survey of 1,603 U.S. adults (18+), fielded between February 16–18, 2025. Respondents were asked whether the United States should:
The survey releases only aggregated percentages by political affiliation:
For each party we know the percentage of respondents in each of the four aid opinion categories. We do not have micro-level respondent data or exact cell counts.
The goal of this project is not to produce a definitive public-opinion study, but rather to:
These goals align with Module 6: Bayesian Computation in MATH 574, where we learn to implement Markov chain Monte Carlo (MCMC) methods and interpret their results.
We manually input the published percentages below. Each row is a political affiliation and each column is an aid opinion.
aid <- data.frame(
party = rep(c("Democrat", "Independent", "Republican"), each = 4),
opinion = rep(c("Increase", "Maintain", "NotSure", "Decrease"), times = 3),
percent = c(
35, 39, 16, 10, # Democrats
19, 23, 26, 33, # Independents
10, 24, 21, 45 # Republicans
)
)
knitr::kable(aid,
caption = "Survey results: percentage of respondents in each aid opinion by party.")
| party | opinion | percent |
|---|---|---|
| Democrat | Increase | 35 |
| Democrat | Maintain | 39 |
| Democrat | NotSure | 16 |
| Democrat | Decrease | 10 |
| Independent | Increase | 19 |
| Independent | Maintain | 23 |
| Independent | NotSure | 26 |
| Independent | Decrease | 33 |
| Republican | Increase | 10 |
| Republican | Maintain | 24 |
| Republican | NotSure | 21 |
| Republican | Decrease | 45 |
We treat the percent column as observed outcomes on a 0–100 scale. This is a simplification:
in reality, within each party the four percentages are constrained to sum to 100%, and each cell
summarizes many underlying respondent-level outcomes.
However, treating percentages as noisy measurements of an underlying “support level” makes the models directly comparable to the Gaussian hierarchical example in Gelman et al. (BDA3, Ch. 11), which the course materials emphasize.
A Bayesian analysis is useful here for several reasons:
Small amount of data
We only have 12 cells (3 parties × 4 opinions). A classical frequentist analysis has limited
flexibility with so few aggregated data points. Bayesian methods can combine prior information
with observed data to stabilize estimates.
Group structure (parties)
Democrats, Independents, and Republicans are naturally viewed as three “groups” or “clusters”.
Hierarchical Bayesian models allow us to:
Uncertainty and interpretation
Posterior distributions give credible intervals and predictive distributions, which we can
interpret directly as degrees of belief about unknown quantities (e.g., mean support levels).
Our analysis will focus on:
We begin with simple summaries and a couple of quick plots to anchor intuition before building models.
summary(aid$percent)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 18.25 23.50 25.08 33.50 45.00
# Wide-format table for convenience
aid_wide <- reshape(
aid,
idvar = "party",
timevar = "opinion",
direction = "wide"
)
aid_wide
library(ggplot2)
ggplot(aid, aes(x = opinion, y = percent, fill = party)) +
geom_col(position = "dodge") +
ylab("Percent of respondents") +
xlab("Opinion on U.S. aid to Ukraine") +
ggtitle("Percent in each opinion category by party") +
theme_minimal()
From this plot:
The Bayesian modeling that follows will formalize these patterns, quantify uncertainty, and explore how much the three parties differ in their latent mean “support level”.
We now build three related Bayesian models for the percent outcome:
To simplify notation, let:
We assume:
\[ y_k \mid \theta_{j(k)}, \sigma^2 \sim \text{Normal}(\theta_{j(k)}, \sigma^2) \]
where:
The three models differ in how \(\theta_j\) is structured.
Before writing the Gibbs samplers, we create some convenient objects:
y <- aid$percent
party_index <- as.numeric(factor(aid$party))
parties <- levels(factor(aid$party))
J <- length(parties) # number of parties (3)
n <- length(y) # total observations (12)
table(party_index)
## party_index
## 1 2 3
## 4 4 4
parties
## [1] "Democrat" "Independent" "Republican"
We use weakly informative priors centered in a reasonable range for percentages. Specifically:
These priors are not meant to encode strong substantive beliefs, but to prevent numerical instability and to regularize the estimates slightly when the data are limited.
We also define a small helper to sample from the inverse-gamma distribution using the gamma sampler:
rinv_gamma <- function(n, shape, rate) {
1 / rgamma(n, shape = shape, rate = rate)
}
We will typically run 8,000 iterations per model and discard the first 2,000 as burn-in.
n_iter <- 8000
burn_in <- 2000
keep_indices <- (burn_in + 1):n_iter
length(keep_indices)
## [1] 6000
The pooled model assumes a single mean \(\mu\) for all observations:
\[ \begin{aligned} y_k \mid \mu, \sigma^2 &\sim N(\mu, \sigma^2), \quad k = 1,\dots,12 \\ \mu &\sim N(m_0, s_0^2) \\ \sigma^2 &\sim \text{Inverse-Gamma}(a_0, b_0). \end{aligned} \]
We choose \(m_0 = 50\), \(s_0^2 = 30^2\), and \(a_0 = b_0 = 0.001\) to be weakly informative.
Given \(\sigma^2\), the conditional for \(\mu\) is normal. Given \(\mu\), the conditional for \(\sigma^2\) is inverse-gamma; this is standard normal–inverse-gamma conjugacy.
pooled_gibbs <- function(y, m0 = 50, s0_sq = 30^2,
a0 = 0.001, b0 = 0.001,
n_iter = 8000) {
n <- length(y)
mu <- numeric(n_iter)
sigma2 <- numeric(n_iter)
# Initialize at sample statistics
mu[1] <- mean(y)
sigma2[1] <- var(y)
for (t in 2:n_iter) {
# Update mu | sigma2, y
s_n_sq <- 1 / (n / sigma2[t-1] + 1 / s0_sq)
m_n <- s_n_sq * (n * mean(y) / sigma2[t-1] + m0 / s0_sq)
mu[t] <- rnorm(1, mean = m_n, sd = sqrt(s_n_sq))
# Update sigma^2 | mu, y
a_n <- a0 + n / 2
b_n <- b0 + 0.5 * sum((y - mu[t])^2)
sigma2[t] <- rinv_gamma(1, shape = a_n, rate = b_n)
}
list(mu = mu, sigma2 = sigma2)
}
post_pooled <- pooled_gibbs(y, n_iter = n_iter)
str(post_pooled)
## List of 2
## $ mu : num [1:8000] 25.1 27.3 27.9 26.2 18.8 ...
## $ sigma2: num [1:8000] 122.6 126.2 81.1 133.1 109.6 ...
The pooled model is intentionally restrictive: it assumes Democrats, Independents, and Republicans all share the same underlying mean support. It serves as a baseline comparison rather than a realistic substantive model.
The separate model allows each party to have its own mean \(\theta_j\), but does not link them:
\[ \begin{aligned} y_k \mid \theta_{j(k)}, \sigma^2 &\sim N(\theta_{j(k)}, \sigma^2), \\ \theta_j &\sim N(m_0, s_0^2) \quad \text{independently for } j = 1,2,3, \\ \sigma^2 &\sim \text{Inverse-Gamma}(a_0, b_0). \end{aligned} \]
Each party is estimated independently, sharing only the common variance \(\sigma^2\).
The full conditional for \(\theta_j\) is normal; the full conditional for \(\sigma^2\) is inverse-gamma based on all residuals.
separate_gibbs <- function(y, party_index,
m0 = 50, s0_sq = 30^2,
a0 = 0.001, b0 = 0.001,
n_iter = 8000) {
J <- length(unique(party_index))
n <- length(y)
theta <- matrix(NA, nrow = n_iter, ncol = J)
sigma2 <- numeric(n_iter)
# Initialize means by party-specific sample means
theta[1, ] <- tapply(y, party_index, mean)
sigma2[1] <- var(y)
for (t in 2:n_iter) {
# Update each theta_j
for (j in 1:J) {
yj <- y[party_index == j]
nj <- length(yj)
s_j_sq <- 1 / (nj / sigma2[t-1] + 1 / s0_sq)
m_j <- s_j_sq * (nj * mean(yj) / sigma2[t-1] + m0 / s0_sq)
theta[t, j] <- rnorm(1, mean = m_j, sd = sqrt(s_j_sq))
}
# Update sigma^2 using all data
resid <- y - theta[t, party_index]
a_n <- a0 + n / 2
b_n <- b0 + 0.5 * sum(resid^2)
sigma2[t] <- rinv_gamma(1, shape = a_n, rate = b_n)
}
colnames(theta) <- parties
list(theta = theta, sigma2 = sigma2)
}
post_sep <- separate_gibbs(y, party_index, n_iter = n_iter)
str(post_sep)
## List of 2
## $ theta : num [1:8000, 1:3] 25 31.6 29.6 22.3 24.2 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:3] "Democrat" "Independent" "Republican"
## $ sigma2: num [1:8000] 123 199 136 152 106 ...
This model reflects a complete lack of pooling between parties: if one party’s sample mean is extreme, the model will largely retain that extremity, even if the sample is small.
The hierarchical model assumes that party means are related through a common population distribution:
\[ \begin{aligned} y_k \mid \theta_{j(k)}, \sigma^2 &\sim N(\theta_{j(k)}, \sigma^2), \\ \theta_j \mid \mu, \tau^2 &\sim N(\mu, \tau^2), \\ \mu &\sim N(m_0, s_0^2), \\ \sigma^2 &\sim \text{Inverse-Gamma}(a_\sigma, b_\sigma), \\ \tau^2 &\sim \text{Inverse-Gamma}(a_\tau, b_\tau). \end{aligned} \]
Here:
This introduces partial pooling: if evidence for a party is weak, its mean will be pulled toward \(\mu\); if evidence is strong, it can differ substantially from \(\mu\).
The full conditionals are all conjugate:
hier_gibbs <- function(y, party_index,
m0 = 50, s0_sq = 30^2,
a_sigma = 0.001, b_sigma = 0.001,
a_tau = 0.001, b_tau = 0.001,
n_iter = 8000) {
parties <- sort(unique(party_index))
J <- length(parties)
n <- length(y)
theta <- matrix(NA, nrow = n_iter, ncol = J)
mu <- numeric(n_iter)
sigma2 <- numeric(n_iter)
tau2 <- numeric(n_iter)
# Initialize from simple summaries
theta[1, ] <- tapply(y, party_index, mean)
mu[1] <- mean(theta[1, ])
sigma2[1] <- var(y)
tau2[1] <- var(theta[1, ])
for (t in 2:n_iter) {
# 1) Update theta_j | ...
for (j in 1:J) {
yj <- y[party_index == j]
nj <- length(yj)
s_j_sq <- 1 / (nj / sigma2[t-1] + 1 / tau2[t-1])
m_j <- s_j_sq * (nj * mean(yj) / sigma2[t-1] + mu[t-1] / tau2[t-1])
theta[t, j] <- rnorm(1, mean = m_j, sd = sqrt(s_j_sq))
}
# 2) Update mu | theta, tau2
s_mu_sq <- 1 / (J / tau2[t-1] + 1 / s0_sq)
m_mu <- s_mu_sq * (J * mean(theta[t, ]) / tau2[t-1] + m0 / s0_sq)
mu[t] <- rnorm(1, mean = m_mu, sd = sqrt(s_mu_sq))
# 3) Update sigma^2 | theta, y
resid <- y - theta[t, party_index]
a_sig_n <- a_sigma + n / 2
b_sig_n <- b_sigma + 0.5 * sum(resid^2)
sigma2[t] <- rinv_gamma(1, shape = a_sig_n, rate = b_sig_n)
# 4) Update tau^2 | theta, mu
a_tau_n <- a_tau + J / 2
b_tau_n <- b_tau + 0.5 * sum((theta[t, ] - mu[t])^2)
tau2[t] <- rinv_gamma(1, shape = a_tau_n, rate = b_tau_n)
}
colnames(theta) <- parties
list(theta = theta, mu = mu, sigma2 = sigma2, tau2 = tau2)
}
post_hier <- hier_gibbs(y, party_index, n_iter = n_iter)
str(post_hier)
## List of 4
## $ theta : num [1:8000, 1:3] 25 25.2 25.2 25.4 24.9 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:3] "1" "2" "3"
## $ mu : num [1:8000] 25.1 25.2 25.1 25 24.6 ...
## $ sigma2: num [1:8000] 122.6 70 144.5 146.4 75.5 ...
## $ tau2 : num [1:8000] 0.02083 0.00983 0.04959 0.38259 0.06952 ...
In the next section, we will summarize and compare the three models, look at the posterior distributions for party means, and construct a predictive distribution for a hypothetical new political affiliation.
We start by extracting posterior samples for the quantities of interest.
# Discard burn-in and thin if desired
theta_sep_post <- post_sep$theta[keep_indices, ]
theta_hier_post <- post_hier$theta[keep_indices, ]
mu_hier_post <- post_hier$mu[keep_indices]
tau2_post <- post_hier$tau2[keep_indices]
sigma2_post <- post_hier$sigma2[keep_indices]
# Quick summaries
sep_summary <- apply(theta_sep_post, 2, function(x) {
c(mean = mean(x),
quantile(x, c(0.025, 0.5, 0.975)))
})
hier_summary <- apply(theta_hier_post, 2, function(x) {
c(mean = mean(x),
quantile(x, c(0.025, 0.5, 0.975)))
})
sep_summary
## Democrat Independent Republican
## mean 26.21374 26.34508 26.13170
## 2.5% 13.19526 13.55415 12.66676
## 50% 26.08960 26.33411 26.13324
## 97.5% 39.67861 39.68228 39.71870
hier_summary
## 1 2 3
## mean 25.02483 25.04081 25.02749
## 2.5% 18.17276 18.13495 18.15512
## 50% 24.97307 24.93119 24.96914
## 97.5% 32.07020 32.07443 32.13649
The sep_summary table gives posterior means and 95% credible intervals for each party under
the separate model; hier_summary does the same under the hierarchical model.
To make this easier to read, we collect results into a single data frame:
parties <- colnames(theta_sep_post)
summary_df <- data.frame(
party = rep(parties, times = 2),
model = rep(c("Separate", "Hierarchical"), each = length(parties)),
mean = c(sep_summary["mean", ], hier_summary["mean", ]),
lower = c(sep_summary["2.5%", ], hier_summary["2.5%", ]),
upper = c(sep_summary["97.5%",], hier_summary["97.5%",])
)
summary_df
library(ggplot2)
ggplot(summary_df, aes(x = party, y = mean, color = model)) +
geom_point(position = position_dodge(width = 0.3), size = 3) +
geom_errorbar(aes(ymin = lower, ymax = upper),
position = position_dodge(width = 0.3),
width = 0.15) +
ylab("Posterior mean percentage") +
xlab("Party") +
ggtitle("Posterior means and 95% credible intervals by party") +
theme_minimal() +
theme(legend.position = "bottom")
Interpretation:
The hierarchical model shrinks each party’s mean slightly toward the overall mean \(\mu\). Because we only have four cell values per party, this partial pooling helps avoid over-interpreting small sample fluctuations.
We also examine the overall mean \(\mu\) and variance components.
hyper_df <- data.frame(
parameter = c("mu (overall mean)", "tau2 (between-party var)", "sigma2 (within-party var)"),
mean = c(mean(mu_hier_post),
mean(tau2_post),
mean(sigma2_post)),
lower = c(quantile(mu_hier_post, 0.025),
quantile(tau2_post, 0.025),
quantile(sigma2_post, 0.025)),
upper = c(quantile(mu_hier_post, 0.975),
quantile(tau2_post, 0.975),
quantile(sigma2_post, 0.975))
)
hyper_df
Interpretation:
The posterior for \(\tau^2\) being clearly nonzero confirms that parties are meaningfully different from one another. At the same time, the credible interval shows that we are uncertain about the exact magnitude of those differences, which is expected with only three groups.
One advantage of the hierarchical model is that we can ask:
If a new political affiliation entered the scene, what mean support level would we expect it to have?
Under the model, a new party’s mean \(\theta_{\text{new}}\) follows:
\[ \theta_{\text{new}} \mid \mu, \tau^2 \sim N(\mu, \tau^2). \]
We approximate this distribution using the posterior draws of \(\mu\) and \(\tau^2\):
theta_new <- rnorm(
length(mu_hier_post),
mean = mu_hier_post,
sd = sqrt(tau2_post)
)
new_party_summary <- c(
mean = mean(theta_new),
quantile(theta_new, c(0.025, 0.5, 0.975))
)
new_party_summary
## mean 2.5% 50% 97.5%
## 25.12772 16.23865 24.97890 34.46476
hist(theta_new, breaks = 30, col = "gray",
main = "Posterior predictive distribution for new party mean",
xlab = "theta_new (mean percentage)")
abline(v = new_party_summary["mean"], col = "red", lwd = 2)
Interpretation:
If a new political affiliation behaved like a “typical” party in this data-generating process, its average aid-opinion percentage would likely lie near the overall mean, with uncertainty captured by the predictive credible interval. The wide spread reflects both the small number of parties and the limited data per party.
From a computational Bayesian perspective, the methods are well-suited:
From a substantive public-opinion perspective, there are important caveats:
Thus, the numerical results should be viewed as illustrative summaries under a stylized model, rather than as definitive measurements of U.S. opinion on Ukraine aid.
Even though the original survey includes 1,603 respondents, our analysis sees only 12 numbers. This creates several issues:
The hierarchical model helps mitigate these concerns by pooling information across parties, but it cannot completely overcome the limitations of the data.
We can trust the analysis to:
We should be cautious in:
In short:
This project applied three Bayesian Gaussian models—pooled, separate, and hierarchical—to a small, aggregated dataset on U.S. attitudes toward military aid to Ukraine.
Using Gibbs sampling, we obtained posterior distributions for:
From a modeling perspective, this example demonstrates the key ideas of Bayesian hierarchical inference, MCMC computation, and posterior prediction in a compact setting, fulfilling the goals of the course’s computational module.
The Economist / YouGov Survey (2025). U.S. public opinion on Ukraine aid.
February 16–18, 2025. Data retrieved from publicly released summary tables.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D., Vehtari, A., & Rubin, D. B. (2013).
Bayesian Data Analysis (3rd ed.). Chapman and Hall/CRC.
Jamshidi, S. (2025). MATH 574 – Bayesian Computational Statistics:
Course materials, modules, and lecture notes, Illinois Institute of Technology.
ChatGPT (OpenAI) (2025). Assistance in drafting narrative structure and boilerplate text, reviewed and validated by the author.